Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
This paper is about semantic regular expressions (SemREs). This is a concept that was recently proposed by Smore (Chen et al. 2023) in which classical regular expressions are extended with a primitive to query external oracles such as databases and large language models (LLMs). SemREs can be used to identify lines of text containing references to semantic concepts such as cities, celebrities, political entities, etc. The focus in their paper was on automatically synthesizing semantic regular expressions from positive and negative examples. In this paper, we study themembership testing problem. First, we present a two-pass NFA-based algorithm to determine whether a stringwmatches a SemRErinO(|r|2|w|2+ |r| |w|3) time, assuming the oracle responds to each query in unit time. In common situations, where oracle queries are not nested, we show that this procedure runs inO(|r|2|w|2) time. Experiments with a prototype implementation of this algorithm validate our theoretical analysis, and show that the procedure massively outperforms a dynamic programming-based baseline, and incurs a ≈ 2 × overhead over the time needed for interaction with the oracle. Second, we establish connections between SemRE membership testing and the triangle finding problem from graph theory, which suggest that developing algorithms which are simultaneously practical and asymptotically faster might be challenging. Furthermore, algorithms for classical regular expressions primarily aim to optimize their time and memory consumption. In contrast, an important consideration in our setting is to minimize the cost of invoking the oracle. We demonstrate an Ω(|w|2) lower bound on the number of oracle queries necessary to make this determination.more » « lessFree, publicly-accessible full text available June 10, 2026
-
The traditional formulation of the program synthesis problem is to find a program that meets a logical correctness specification. When synthesis is successful, there is a guarantee that the implementation satisfies the specification. Unfortunately, synthesis engines are typically monolithic algorithms, and obscure the correspondence between the specification, implementation and user intent. In contrast, humans often include comments in their code to guide future developers towards the purpose and design of different parts of the codebase. In this paper, we introducesubspecificationsas a mechanism to augment the synthesized implementation with explanatory notes of this form. In this model, the user may ask for explanations of different parts of the implementation; the subspecification generated in response is a logical formula that describes the constraints induced on that subexpression by the global specification and surrounding implementation. We develop algorithms to construct and verify subspecifications and investigate their theoretical properties. We perform an experimental evaluation of the subspecification generation procedure, and measure its effectiveness and running time. Finally, we conduct a user study to determine whether subspecifications are useful: we find that subspecifications greatly aid in understanding the global specification, in identifying alternative implementations, and in debugging faulty implementations.more » « less
An official website of the United States government
